AITopics

2503.00565

Country:

North America > United States > Pennsylvania (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > United States > Florida > Broward County > Fort Lauderdale (0.04)
Asia > Middle East > Republic of Türkiye (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.66)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

arXiv.org Artificial IntelligenceDec-5-2024

Ltri-LLM: Streaming Long Context Inference for LLMs with Training-Free Dynamic Triangular Attention Pattern

Tang, Hongyin, Xiu, Di, Wang, Lanrui, Geng, Xiurui, Wang, Jingang, Cai, Xunliang

The quadratic computational complexity of the attention mechanism in current Large Language Models (LLMs) renders inference with long contexts prohibitively expensive. To address this challenge, various approaches aim to retain critical portions of the context to optimally approximate Full Attention (FA) through Key-Value (KV) compression or Sparse Attention (SA), enabling the processing of virtually unlimited text lengths in a streaming manner. However, these methods struggle to achieve performance levels comparable to FA, particularly in retrieval tasks. In this paper, our analysis of attention head patterns reveals that LLMs' attention distributions show strong local correlations, naturally reflecting a chunking mechanism for input context. We propose Ltri-LLM framework, which divides KVs into spans, stores them in an offline index, and retrieves the relevant KVs into memory for various queries. Experimental results on popular long text benchmarks show that Ltri-LLM can achieve performance close to FA while maintaining efficient, streaming-based inference.

depth percent, ltri-llm, streaming long context inference, (9 more...)

2412.04757

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > United Kingdom (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Simsek, Berfin, Bendjeddou, Amire, Hsu, Daniel

Learning Gaussian Multi-Index Models with Gradient Flow: Time Complexity and Directional Convergence

arXiv.org Artificial IntelligenceNov-13-2024

This work focuses on the gradient flow dynamics of a neural network model that uses correlation loss to approximate a multi-index function on high-dimensional standard Gaussian data. Specifically, the multi-index function we consider is a sum of neurons $f^*(x) \!=\! \sum_{j=1}^k \! \sigma^*(v_j^T x)$ where $v_1, \dots, v_k$ are unit vectors, and $\sigma^*$ lacks the first and second Hermite polynomials in its Hermite expansion. It is known that, for the single-index case ($k\!=\!1$), overcoming the search phase requires polynomial time complexity. We first generalize this result to multi-index functions characterized by vectors in arbitrary directions. After the search phase, it is not clear whether the network neurons converge to the index vectors, or get stuck at a sub-optimal solution. When the index vectors are orthogonal, we give a complete characterization of the fixed points and prove that neurons converge to the nearest index vectors. Therefore, using $n \! \asymp \! k \log k$ neurons ensures finding the full set of index vectors with gradient flow with high probability over random initialization. When $ v_i^T v_j \!=\! \beta \! \geq \! 0$ for all $i \neq j$, we prove the existence of a sharp threshold $\beta_c \!=\! c/(c+k)$ at which the fixed point that computes the average of the index vectors transitions from a saddle point to a minimum. Numerical simulations show that using a correlation loss and a mild overparameterization suffices to learn all of the index vectors when they are nearly orthogonal, however, the correlation loss fails when the dot product between the index vectors exceeds a certain threshold.

index vector, initialization, vector, (14 more...)

2411.08798

Country: Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)

arXiv.org Machine LearningSep-8-2022

Model-free Subsampling Method Based on Uniform Designs

Zhang, Mei, Zhou, Yongdao, Zhou, Zheng, Zhang, Aijun

Subsampling or subdata selection is a useful approach in large-scale statistical learning. Most existing studies focus on model-based subsampling methods which significantly depend on the model assumption. In this paper, we consider the model-free subsampling strategy for generating subdata from the original full data. In order to measure the goodness of representation of a subdata with respect to the original data, we propose a criterion, generalized empirical F-discrepancy (GEFD), and study its theoretical properties in connection with the classical generalized L2-discrepancy in the theory of uniform designs. These properties allow us to develop a kind of low-GEFD data-driven subsampling method based on the existing uniform designs. By simulation examples and a real case study, we show that the proposed subsampling method is superior to the random sampling method. Moreover, our method keeps robust under diverse model specifications while other popular subsampling methods are under-performing. In practice, such a model-free property is more appealing than the model-based subsampling methods, where the latter may have poor performance when the model is misspecified, as demonstrated in our simulation studies.

artificial intelligence, machine learning, regression model, (13 more...)

2209.03617

Country:

Asia > China > Hong Kong (0.04)
Asia > China > Tianjin Province > Tianjin (0.04)
North America > United States > New York (0.04)
(3 more...)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.32)

Kereta, Zeljko, Klock, Timo

Estimating covariance and precision matrices along subspaces

arXiv.org Machine LearningSep-26-2019

We study the accuracy of estimating the covariance and the precision matrix of a $D$-variate sub-Gaussian distribution along a prescribed subspace or direction using the finite sample covariance with $N \geq D$ samples. Our results show that the estimation accuracy depends almost exclusively only on the components of the distribution that correspond to desired subspaces or directions. This is relevant for problems where behavior of data along a lower-dimensional space is of specific interest, such as dimension reduction or structured regression problems. As a by-product of the analysis, we reduce the effect the matrix condition number has on the estimation of precision matrices. Two applications are presented: direction-sensitive eigenspace perturbation bounds, and estimation of the single-index model. For the latter, a new estimator, derived from the analysis, with strong theoretical guarantees and superior numerical performance is proposed.

estimation, matrix, probability, (16 more...)

1909.12218

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Norway > Eastern Norway > Oslo (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

arXiv.org Artificial IntelligenceAug-11-2019

Multi-owner Secure Encrypted Search Using Searching Adversarial Networks

Chen, Kai, Lin, Zhongrui, Wan, Jian, Xu, Lei, Xu, Chungen

Searchable symmetric encryption (SSE) for multi-owner model draws much attention as it enables data users to perform sear ches over encrypted cloud data outsourced by data owners. However, im plement-ing secure and precise query, efficient search and flexible dyn amic system maintenance at the same time in SSE remains a challenge. To ad dress this, this paper proposes secure and efficient multi-keyword ranked search over encrypted cloud data for multi-owner model based on sea rching adversarial networks. We exploit searching adversarial netw orks to achieve optimal pseudo-keyword padding, and obtain the optimal gam e equilibrium for query precision and privacy protection strength. M aximum likelihood search balanced tree is generated by probabilistic l earning, which achieves efficient search and brings the computational compl exity close to O (log N). In addition, we enable flexible dynamic system maintenanc e with balanced index forest that makes full use of distribute d computing. Compared with previous works, our solution maintains query precision above 95% while ensuring adequate privacy protection, and i ntroduces low overhead on computation, communication and storage.

artificial intelligence, cloud computing, machine learning, (16 more...)

1908.02784

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.94)
(2 more...)

Kereta, Zeljko, Klock, Timo, Naumova, Valeriya

Nonlinear generalization of the single index model

arXiv.org Machine LearningFeb-24-2019

Single index model is a powerful yet simple model, widely used in statistics, machine learning, and other scientific fields. It models the regression function as $g()$, where a is an unknown index vector and x are the features. This paper deals with a nonlinear generalization of this framework to allow for a regressor that uses multiple index vectors, adapting to local changes in the responses. To do so we exploit the conditional distribution over function-driven partitions, and use linear regression to locally estimate index vectors. We then regress by applying a kNN type estimator that uses a localized proxy of the geodesic metric. We present theoretical guarantees for estimation of local index vectors and out-of-sample prediction, and demonstrate the performance of our method with experiments on synthetic and real-world data sets, comparing it with state-of-the-art methods.

artificial intelligence, estimator, machine learning, (16 more...)

1902.09024

Country: Europe (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.86)

De Vries, Christopher M., De Vine, Lance, Geva, Shlomo

Random Indexing K-tree

arXiv.org Artificial IntelligenceFeb-1-2010

The purpose of this paper is to present and analyse the combination of Random Indexing (RI) with the K-tree algorithm. Both RI and K-tree adapt to changing data and decrease the cost of computationally intensive vector based applications. This combination is particularly suitable to the representation and clustering of very large document collections. Documents are typically represented in vector space as very sparse high dimensional vectors. RI can reduce the dimensionality and sparsity of this representation. In turn, the condensed representation is highly effective when working with K-tree. The paper is focused on determining the effectiveness of using RI with K-tree through experiments and comparative analysis of results. Sections 2 to 6 discuss K-tree, Random Indexing, Document Representation, Experimental Setup and Experimental results respectively. The paper ends with a conclusion in Section 7.

information retrieval, machine learning, natural language, (20 more...)

1001.0833

Country:

Oceania > Australia (0.46)
North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.73)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.48)